[feat] Enable MLA chunked prefill and KV cache reuse on SM121#15347
[feat] Enable MLA chunked prefill and KV cache reuse on SM121#15347CodersAcademy006 wants to merge 2 commits into
Conversation
📝 WalkthroughWalkthroughIn ChangesMLA SM121 allowlist expansion
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Signed-off-by: Srijan Upadhyay <srjnupadhyay@gmail.com>
65577f3 to
156e45b
Compare
|
@CodersAcademy006 , |
I updated the SM version check logic in
py_executor_creator.pyto allow MLA chunked prefill and KV cache block reuse on SM121 (Blackwell) architectures. Specifically, I added SM121 (121) to the validation lists for bothenable_block_reuseandenable_chunked_contextchecks, preventing the executor from automatically disabling these features on Blackwell GPUs.This resolves issue #15344, where these optimization features were being disabled on SM121 devices because the Python-side validator was missing SM121 in its allowlist. The underlying C++ kernels already support SM121, so this change enables full compatibility for MLA optimizations on this hardware.